Mid-level Representation for Visual Recognition

نویسنده

  • Moin Nabi
چکیده

Visual Recognition is one of the fundamental challenges in AI, where the goal is to understand the semantics of visual data. Employing mid-level representation, in particular, shifted the paradigm in visual recognition. The mid-level image/video representation involves discovering and training a set of mid-level visual patterns (e.g., parts and attributes) and represent a given image/video utilizing them. The mid-level patterns can be extracted from images and videos using the motion and appearance information of visual phenomenas. This thesis targets employing mid-level representations for different high-level visual recognition tasks, namely (i) image understanding and (ii) video understanding. In the case of image understanding, we focus on object detection/recognition task. We investigate on discovering and learning a set of mid-level patches to be used for representing the images of an object category. We specifically employ the discriminative patches in a subcategory-aware webly-supervised fashion. We, additionally, study the outcomes provided by employing the subcategory-based models for undoing dataset bias. In the case of video understanding, we first study acquiring a mid-level motion-based representation (i.e., tracklet) to capture the commotion of a crowd motion for the task of abnormality detection in crowd. Next, we study exploiting dynamics of a dictionary of appearance-based models (i.e. Poselets) as human motion representation for activity recognition. Our empirical studies show that employing richer mid-level representations can provide significant benefits for visual recognition (both in image and video understanding). Our main contributions are as follows: (a) introducing a method for undoing dataset bias using subcategory-based models, (b) discovering and training webly-supervised subcategory-aware discriminative patches for object recognition, (c) proposing tracklet-based commotion measure for abnormality detection in crowd, (d) introducing Temporal Poselet a descriptor for group activity recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Latent semantic learning with structured sparse representation for human action recognition

This paper proposes a novel latent semantic learning method for extracting high-level latent semantics from a large vocabulary of abundant mid-level features (i.e. visual keywords) with structured sparse representation, which can help to bridge the semantic gap in the challenging task of human action recognition. To discover the manifold structure of mid-level features, we develop a graph-based...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

Hierarchical Implicit Shape Modeling

In this paper, a new hierarchical approach for part-based object recognition is proposed. Object detection methods based on Implicit Shape Model (ISM) efficiently handle deformable objects, occlusions and clutters. The structure of each object in ISM is defined by a spring like graph, hence parts independently vote to object properties. We introduce hierarchical ISM in which structure of each o...

متن کامل

SuperPixel based mid-level image description for image recognition

This study proposes a mid-level feature descriptor and aims to validate improvement on image classification and retrieval tasks. In this paper, we propose a method to explore the conventional feature extraction techniques in the image classification pipeline from a different perspective where mid-level information is also incorporated in order to obtain a superior scene description. We hypothes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1512.07314  شماره 

صفحات  -

تاریخ انتشار 2015